Goto

Collaborating Authors

 deep convolutional network


1b742ae215adf18b75449c6e272fd92d-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their time and effort in providing feedback. For clarity, we would like to reiterate the goal and motivation of the paper. Ingeneral, wedonothaveaccess14 to the target network, but only to the labeled training data. AsoptimizingReLU16 neural network is itself NP-Hard in general, we expect all algorithms to be inefficient in the worst case. Thus, the approximated network achieved 97.17% test set accuracy with 44.69% sparsity.27


Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Neural Information Processing Systems

In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of $\mathcal{O}(\log d)$ suffices for deep CNNs to achieve this universality, where $d$ in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only $\widetilde{\mathcal{O}}(\log^2d)$ samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.


Supplementary Material: Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Neural Information Processing Systems

Our proposed aligned structured sparsity learning (ASSL) algorithm is summarized in Algorithm 1. There are in total 16 residual blocks in EDSR_baseline. We provide more visual comparisons in Figure 1. In contrast, our ASSLN can better recover more structural details. While, our ASSLN can better alleviate the blurring artifacts.


Supplementary Material: Aligned Structured Sparsity Learning for Efficient Image Super-Resolution

Neural Information Processing Systems

Our proposed aligned structured sparsity learning (ASSL) algorithm is summarized in Algorithm 1. There are in total 16 residual blocks in EDSR_baseline. We provide more visual comparisons in Figure 1. In contrast, our ASSLN can better recover more structural details. While, our ASSLN can better alleviate the blurring artifacts.



Sparks of Explainability: Recent Advancements in Explaining Large Vision Models

Fel, Thomas

arXiv.org Artificial Intelligence

This thesis explores advanced approaches to improve explainability in computer vision by analyzing and modeling the features exploited by deep neural networks. Initially, it evaluates attribution methods, notably saliency maps, by introducing a metric based on algorithmic stability and an approach utilizing Sobol indices, which, through quasi-Monte Carlo sequences, allows a significant reduction in computation time. In addition, the EVA method offers a first formulation of attribution with formal guarantees via verified perturbation analysis. Experimental results indicate that in complex scenarios these methods do not provide sufficient understanding, particularly because they identify only "where" the model focuses without clarifying "what" it perceives. Two hypotheses are therefore examined: aligning models with human reasoning -- through the introduction of a training routine that integrates the imitation of human explanations and optimization within the space of 1-Lipschitz functions -- and adopting a conceptual explainability approach. The CRAFT method is proposed to automate the extraction of the concepts used by the model and to assess their importance, complemented by MACO, which enables their visualization. These works converge towards a unified framework, illustrated by an interactive demonstration applied to the 1000 ImageNet classes in a ResNet model.


Theoretical Analysis of the Inductive Biases in Deep Convolutional Networks

Neural Information Processing Systems

In this paper, we provide a theoretical analysis of the inductive biases in convolutional neural networks (CNNs). We start by examining the universality of CNNs, i.e., the ability to approximate any continuous functions. We prove that a depth of \mathcal{O}(\log d) suffices for deep CNNs to achieve this universality, where d in the input dimension. Additionally, we establish that learning sparse functions with CNNs requires only \widetilde{\mathcal{O}}(\log 2d) samples, indicating that deep CNNs can efficiently capture {\em long-range} sparse correlations. These results are made possible through a novel combination of the multichanneling and downsampling when increasing the network depth.


RenderNet: A deep convolutional network for differentiable rendering from 3D shapes

Neural Information Processing Systems

Traditional computer graphics rendering pipelines are designed for procedurally generating 2D images from 3D shapes with high performance. The nondifferentiability due to discrete operations (such as visibility computation) makes it hard to explicitly correlate rendering parameters and the resulting image, posing a significant challenge for inverse rendering tasks. Recent work on differentiable rendering achieves differentiability either by designing surrogate gradients for non-differentiable operations or via an approximate but differentiable renderer. These methods, however, are still limited when it comes to handling occlusion, and restricted to particular rendering effects. We present RenderNet, a differentiable rendering convolutional network with a novel projection unit that can render 2D images from 3D shapes.


Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

Li, Lei, Chen, Zhifa, Wang, Jian, Zhou, Bin, Yu, Guizhen, Chen, Xiaoxuan

arXiv.org Artificial Intelligence

Recently, the application of autonomous driving in open-pit mining has garnered increasing attention for achieving safe and efficient mineral transportation. Compared to urban structured roads, unstructured roads in mining sites have uneven boundaries and lack clearly defined lane markings. This leads to a lack of sufficient constraint information for predicting the trajectories of other human-driven vehicles, resulting in higher uncertainty in trajectory prediction problems. A method is proposed to predict multiple possible trajectories and their probabilities of the target vehicle. The surrounding environment and historical trajectories of the target vehicle are encoded as a rasterized image, which is used as input to our deep convolutional network to predict the target vehicle's multiple possible trajectories. The method underwent offline testing on a dataset specifically designed for autonomous driving scenarios in open-pit mining and was compared and evaluated against physics-based method.


A Goal-Driven Approach to Systems Neuroscience

Nayebi, Aran

arXiv.org Artificial Intelligence

Humans and animals exhibit a range of interesting behaviors in dynamic environments, and it is unclear how our brains actively reformat this dense sensory information to enable these behaviors. Experimental neuroscience is undergoing a revolution in its ability to record and manipulate hundreds to thousands of neurons while an animal is performing a complex behavior. As these paradigms enable unprecedented access to the brain, a natural question that arises is how to distill these data into interpretable insights about how neural circuits give rise to intelligent behaviors. The classical approach in systems neuroscience has been to ascribe well-defined operations to individual neurons and provide a description of how these operations combine to produce a circuit-level theory of neural computations. While this approach has had some success for small-scale recordings with simple stimuli, designed to probe a particular circuit computation, often times these ultimately lead to disparate descriptions of the same system across stimuli. Perhaps more strikingly, many response profiles of neurons are difficult to succinctly describe in words, suggesting that new approaches are needed in light of these experimental observations. In this thesis, we offer a different definition of interpretability that we show has promise in yielding unified structural and functional models of neural circuits, and describes the evolutionary constraints that give rise to the response properties of the neural population, including those that have previously been difficult to describe individually. We demonstrate the utility of this framework across multiple brain areas and species to study the roles of recurrent processing in the primate ventral visual pathway; mouse visual processing; heterogeneity in rodent medial entorhinal cortex; and facilitating biological learning.